ABHA: A Framework for Autonomic Job Recovery
نویسندگان
چکیده
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; and rapidly integrating this information into the cluster architecture so that the failure is better mitigated in the future. The Agent Based High Availability (ABHA) system provides an API and a collection of services for building autonomic batch job recovery into cluster computing environments. An agent API allows users to define agents for failure diagnosis and recovery. It is currently being evaluated in the U.S. Department of Energy's STAR project.
منابع مشابه
High job control enhances vagal recovery in media work.
BACKGROUND Job strain has been linked to increased risk of cardiovascular diseases. In modern media work, time pressures, rapidly changing situations, computer work and irregular working hours are common. Heart rate variability (HRV) has been widely used to monitor sympathovagal balance. Autonomic imbalance may play an additive role in the development of cardiovascular diseases. AIMS To study...
متن کاملAn Autonomic Service Oriented Architecture in Computational Engineering Framework
Service Oriented Architecture (SOA) technology enables composition of large and complex computational units out of the available atomic services. Implementation of SOA brings about challenges which include service discovery, service interaction, service composition, robustness, quality of service, security, etc. These challenges are mainly due to the dynamic nature of SOA. SOAmay often need to ...
متن کاملAn Autonomic Service Oriented Architecture in Computational Engineering Framework
Service Oriented Architecture (SOA) technology enables composition of large and complex computational units out of the available atomic services. Implementation of SOA brings about challenges which include service discovery, service interaction, service composition, robustness, quality of service, security, etc. These challenges are mainly due to the dynamic nature of SOA. SOAmay often need to ...
متن کاملA reputation-driven scheduler for autonomic and sustainable resource sharing in Grid computing
The obstacle for the Grid to be prevalent is the difficulty in using, configuring and maintaining it, which needs excessive IT knowledge, workload, and human intervention. At the same time, inter-operation amongst Grids is on track. To be the core of Grid systems, the resource management must be autonomic and inter-operational to be sustainable for future Grid computing. For this purpose, we in...
متن کاملAutonomic configuration and recovery in a mobile agent-based distributed event monitoring system
In this paper we present a framework for building policy-based autonomic distributed agent systems. The autonomic mechanisms of configuration and recovery are supported through a distributed event processing model and a set of policy enforcement mechanisms embedded in an agent framework. Policies are event-driven rules derived from the system’s functional and non-functional requirements. Agents...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004